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Abstract 

Following up on Barabasi's recent letter to Nature [435, 207-211 (2005)], we systematically investigate 
the time series of e-mail usage for 3,188 users at a university. We focus on two quantities for each user: 
the time interval between consecutively sent e-mails (interevent time), and the time interval between when 
a user sends an e-mail and when a recipient sends an e-mail back to the original sender (waiting time). We 
perform a standard Bayesian model selection analysis that demonstrates that the interevent times are well- 
described by a single log-normal while the waiting times are better described by the superposition of two 
log-normals. Our analysis rejects the possibility that either measure could be described by truncated power- 
law distributions with exponent a ~ 1. We also critically evaluate the priority queuing model proposed 
by Barabasi to describe the distribution of the waiting times. We show that neither the assumptions nor 
the predictions of the model are plausible, and conclude that a theoretical description of human e-mail 
communication patterns remains an open problem. 
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I. INTRODUCTION 



Human beings are extraordinarily complex agents. Remarkably, in spite of that complexity 
a number of striking statistical regularities are known to describe individual and societal human 
behavior llll|2l|3l|4j,|5l|6|]. These regularities are of enormous practical importance because of the 
influence of individual behaviors on social and economic outcomes. 

Even though the analysis of social and economic data has a long and illustrious history, from 
Smith [Q] to Pareto iQ] and to Zipf Jg], the recent availability of digital records has made it much 
easier for researchers to quantitatively investigate various aspects of human behavior. In particular, 
the availability and omnipresence of e-mail communication records is attracting much attention 



tne availability and 



Recently, Barabasi studied the e-mail records of users at a university and reported two patterns 
in e-mail communication J12I : the time interval between two consecutive e-mails sent by the 
same user, which we will denote as the interevent time r, and the time interval between when 
a user sends an e-mail and when a recipient sends an e-mail back to the original sender, which 
we will denote as the waiting time r w , follow power-law distributions which decay in the tail 
with exponent a ~ 1. Additionally, Barabasi proposed a priority queuing model that reportedly 
captures the processes by which individuals reply to e-mails, thereby predicting the probability 
distribution of r w . 

Here, we demonstrate that the empirical results reported in Ref. J13I1 are an artifact of the 
data analysis. We perform a standard Bayesian model selection analysis that demonstrates that 
the interevent times are well-described by a single log-normal while the waiting times are better 
described by the superposition of two log-normals. Our analysis rejects beyond any doubt the 
possibility that the data could be described by truncated power-law distributions. 

We also critically evaluate the priority queuing model proposed by Barabasi to describe the 
observed waiting time distributions. We show that neither the assumptions nor the predictions of 
the model are plausible. We thus conclude that the description of human e-mail communication 
patterns remains an open problem. 

The remainder of this paper is organized as follows. In Section II, we describe the preprocessing 
of the data. We then analyze the distribution of interevent times (Section ITTTb and the distribution 
of waiting times ("Section ITvT) . Finally, in Section |V| we investigate the priority queuing model of 
Ref. bl. 
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H. PREPROCESSING OF THE DATA 

We consider here the database investigated by Barabasi [13], which was also the focus of an 
earlier paper by Eckmann et al. [12]. This database consists of e-mail records for 3,188 e-mail 
accounts at a university covering an 83-day period. Each record comprises a sender identifier, a 
recipient identifier, the size of the e-mail, and a time stamp with a precision of one second. Before 
describing our analysis of the data, we first note some important features of the data which impact 
the analysis. 

The first important fact is that the data were gathered at an e-mail server, not from the e- 
mail clients of the individual users. It is quite possible that some users have e-mail clients, like 
Microsoft Outlook, which permit users to send multiple e-mails at once regardless of when the 
e-mails were composed. Moreover, servers may parse long recipient lists into several shorter lists 
Jisl . For this reason, e-mails to multiple recipients were occasionally recorded in the server as 
multiple e-mails. Each of these duplicate e-mails was then sent in rapid succession to a different 
subset of the list of recipients in the actual e-mail. Both the client-side and server-side uncertainties 
introduce artifacts in the time series of interevent times for each user as it could appear that a user 
is sending several e-mails over a very short time interval. 

To minimize these uncertainties, we preprocessed the data in order to focus on actual human 
behavior. First, we identify sets of e-mails sent by a user that have the exact same size but whose 
time stamp differs by at most five seconds 1 . We then remove all but the first e-mail from the 
time series of e-mails sent, while adjusting the list of recipients to the first e-mail to include all 
recipients in the removed e-mails 1 . 

An additional important fact to note is that some of the e-mail accounts do not belong to "typ- 
ical" users. For example, User 1962 only sent 5 e-mails while receiving 2,284 e-mails. This 
individual's e-mail use is too infrequent to provide useful information on human dynamics. Mean- 
while User 4099 sent 9,431 e-mails while receiving no e-mails. Although it cannot be confirmed 

1 Five seconds corresponds with the average minimal bound on humanly possible interevent times based on the 
experiment in Fig. [2] 

2 A more aggressive preprocessing method would also remove blind-carbon-copied (BCC) e-mails. The basic idea 
is that an e-mail with BCC recipients will have its size increased by a few bytes due to the addition of outgoing 
headers. E-mails to the BCC recipients would be sent by the server shortly after the e-mail to visible recipients and 
would thus increase the number of very small interevent times. We choose to err on the side of caution and not 
attempt to remove e-mails with BCC recipients, as their detection is more subjective. 
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FIG. 1: Preprocessing of the data. A, The cumulative distribution of the number of e-mails sent by the 3,188 
users over 83 days. To avoid characterizing users which use e-mail infrequently, we consider the 1,212 users 
which sent at least 1 1 e-mails over 83 days (shaded region). Note that this removes the 759 users which sent 
no e-mails and the 370 users that sent one e-mail. B, The distribution of the ratio of the number of e-mails 
received to the number of e-mails sent is well-described by a log-normal (red line). We use this fact to 
develop a criteria to identify typical e-mail users that sent at least 1 1 e-mails over 83 days and differentiate 
them from bulk e-mail accounts, listserves, and e-mail accounts that are rarely used by their owners. We 
keep the 1,152 users which fall within three standard deviations of the mean (shaded region). 



due to the anonymous nature of the data, this e-mail account was in all likelihood used for bulk 
e-mails, implying that it cannot provide information on human e-mail usage. 

To avoid having our analysis distorted, we first restrict our attention to users which sent at least 
1 1 e-mails over the 83-day experiment, yielding a minimum of 10 interevent times. Our reasoning 
is that users sending fewer e-mails do not use e-mail regularly enough to allow us to truly infer 
patterns of human dynamics. This procedure excludes 1,976 of the 3,188 original e-mail accounts. 

Next we examine the ratio of the number of e-mails received to the number of e-mails sent to 
determine what constitutes a "typical" user. This ratio is well-described by a log-normal distribu- 
tion, and we use this fact to consider only those users in our study who are within three standard 
deviations from the mean. This added constraint excludes an additional 46 users. We thus focus 
here on the 1,152 users who fulfill the above criteria (Fig.[l]). 



III. INTEREVENT TIMES 

Reference lol reports that the probability distribution of time intervals r between consecutive 
e-mails sent by an individual follows a power-law P(t) ~ r~ a with a ~ 1. A basic examination 
of Barabasi's results, however, quickly reveals a number of issues. 

1. Figure 2a of Ref. L 1 3] features three bins corresponding to interevent times r < 3 seconds, 



4 



, , 6 -oo oo 

CO 

oo oooooo o o oo oo 



o oo o o 



5 10 15 20 25 

Index of sent e-mail 



— r~ 

o 



ooo oo oooo o o 



o o 

J , l_ 



o o o 

J I I 



5 10 15 20 

Index of sent e-mail 




10"' 



10 u 



10' 
T[S] 



10 



10" 



FIG. 2: The statistical analysis presented in Fig. 2a of Ref. II 311 . A, Estimation of a lower bound on 
interevent times, r. Two of us sent about twenty e-mails trying to minimize the time interval between con- 
secutive e-mails. To be as fast as possible, we sent replies to an e-mail already in our inboxes. Additionally, 
we did not even write any text or read the e-mail to which we were responding. We find that humans need 
at least 3 seconds to send consecutive e-mails. B, Reproduction of Fig.2a of Ref. Hill obtained with Vis- 
taMetrix [ llj and, C, the same data with the boundaries of the bins clearly marked. We assumed that the 
data points in Fig. 2a of Ref. 11311 were placed in the middle of the bin. Note that there is a bin recording 
data for r < 1 second, whereas the data have a resolution of one second. The shaded bins indicate values 
with r < 3 second, which contain 9% of all events for the unidentified user. 



an unphysical interval (Fig.0\)- The events in those bins in fact account for 9% of all events. 







2. Figure 2a of Ref. [13] features at least one bin confined to interevent times r < 1 second 
(Fig.|2B-C) while the data have a precision of one second II 1211 . 



We next quantitatively compare the plausibility of our log-normal hypothesis with the plausi- 
bility of the power-law hypothesis of Ref. for interevent times. To simplify the analysis, we do 
not consider r, but its logarithm. If a random variable r is log-normally distributed, then u = ln(r) 
follows a Gaussian distribution, whereas if r is distributed according to a power-law with exponent 
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FIG. 3: Bayesian model selection protocol for interevent times. A, Cumulative distributions of u = ln(r) 
for three users in the database. The top panels show data and Gaussian model predictions for the entire range 
of u, whereas the bottom panels show data and power-law model for intermediate values of u. B, Scatter 
plot of i 3 KS,iog-normai/-f , KS,power-iaw> the ratio of the two Pks values, for all available users depending 
on the number of consecutive e-mails sent for each user. The larger circles colored green, yellow, and red 
correspond to the data shown in (A). Note that there are 2 users for which the ratio of Pks is greater than 
10 20 . Those users are indicated by the arrows. C, Recursively calculated posterior probability of accepting 
the power-law model. We use Bayesian model selection to recursively calculate the posterior probability 
of selecting the power-law distribution for two different prior probabilities: P(power — law) = 0.95 (solid 
black) and P(power — law) = 0.50 (red dashed). In both cases, the posterior probability of selecting the 
power-law model vanishes after considering 932 of the 1,016 users meeting our preprocessing criteria. 



a = 1, then u is uniformly distributed in the interval [ln(r min ), ln(r max )]. Specifically, for 



P(t) oc 



, otherwise 



(1) 



the distribution of u = ln(r) is 



P(u) 



1 max "rnm 



0. 



otherwise 



(2) 



Barabasi [13] has argued that the power-law model is meant to describe only "intermediate" r 
values falling between 100 and 10,000 seconds. Since some users have a smaller range of r values 
than that interval, we test the agreement of the predictions of the power-law model only with data 
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in the interval [r min , r max ], where r min = inf {r|r > 100} and r max = sup{r|r < 10,000}. To 
properly specify the power-law distribution, we must have at least two data points in [r min , r max ). 
This constraint leads to the exclusion of an additional 136 users. 

We then use the Kolmogorov-Smirnov (KS) test Jl7] as a measure of the plausibility of a model 
given the user's data. Specifically, we compare the distribution of the logarithm of the interevent 
times for a given user to two candidate models: a Gaussian distribution and a uniform distribution. 

Importantly, there is absolutely no fitting in our analysis. The parameters of the Gaussian 
distribution, [i and a, are simply the sample average and standard deviation of u, while the uniform 
distribution is completely specified by u min and u max . Figure |3t> displays the ratio of the two KS 
probabilities versus number of e-mails sent for all users with at least two data points in the interval 

[^mim 7~max] ■ 

In order to determine which of the two models provides a more accurate description of the 
empirical data, we use the results of the KS test as inputs in a Bayesian model selection analy- 
sis OH- Bayes' rule states that 

1 > l Y.uP(£<\M t )P(M t )' w 

where P(A4j\£i) is the posterior probability of selecting model M.j given an observation Si, 
P(£i\Mj) = P KS (£i\Mj) is the probability of observing £ x given a model Mj, and P(Mj) is the 
prior probability of selecting model Aij. Assuming no prior knowledge about the correctness of 
the power-law and log-normal models, one would select P(log-normal) = P(power-law) = 0.5 
for each model. However, to eliminate any bias on our part, we perform the Bayesian model selec- 
tion analysis for two cases: (z) no prior knowledge, P(log-normal) = P(power-law) = 0.5, and 
(ii) the power-law model is far more likely to be correct, P(power-law) = 0.95. 

The availability of data for multiple users enables us to perform this analysis recursively to 
obtain posterior probabilities of selecting each model given the available data. Concretely, the 
analysis of the interevent times £ j from user i updates the posterior probabilities of the two models 
P(Aij\£i) using Eq. ©. These updated posterior probabilities are then used as prior probabilities 
for the next user i + 1. When all of the users have been included, this analysis reveals the posterior 
probability of the model given all of the available data. The Bayesian model selection analysis 
demonstrates that the likelihood of the truncated power-law model being a good description of the 
data vanishes to zero when all data is considered (Fig. [3jH). 
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IV. WAITING TIMES 



Before we present our analysis of the waiting times, we must note that the database collected 
and analyzed by Barabasi [ 13] is not particularly well-suited for identifying 
the waiting times for replying to an e-mail. The data merely records that an e-mail was sent by 
user A to user B at time t. The data does not specify whether the e-mail from A to B is, in fact, a 
reply to a prior message. Imagine the following scenario: user A sends an e-mail to user B. Three 
days later, user B sends an unrelated e-mail to user A. Barabasi's approach J13I . which we follow, 
is to classify this e-mail as a reply to the e-mail sent by user A three days earlier. As this case 
illustrates, the analysis of waiting times is significantly less reliable than that of interevent times. 

Reference J13I reports that the probability distribution of time intervals t w between receiving a 
message from a sender and sending another e-mail to that sender follows a power-law distribution 
P(t w ) ~ r~ a with a ~ 1. A cursory analysis of this result again reveals several problems. 



1. Figure 2b of Ref. [ 13] features three bins corresponding to waiting times r w < 6 seconds, 
an unphysical interval (Fig.|4K). The events in those bins account for 1% of all events. 

2. Figure 2b of Ref. [13] features two bins confined to waiting times r w < 1 second (Fig.|4j3- 
C) while the data have a precision of one second J3]. 

We characterize the actual distribution of waiting times r w following the same procedure out- 
lined in SectionHnl After parsing the data, we are left with 724 users which have sent at least 10 re- 
sponse e-mails over 83 days and have at least two waiting times in the interval 100 < r w < 10, 000 
seconds. We then perform KS tests and Bayesian model selection to determine whether the wait- 
ing times are better described by a power-law or log-normal distribution. The Bayesian model 
selection analysis demonstrates that the likelihood of the truncated power-law model being a good 
description of the data vanishes to zero when all data is considered (Fig.|5]). 



A. Double log-normal description 

Analysis of the data for the users with the largest number of replies suggests that t w may 
actually be better described by a superposition of two log-normal peaks: the first peak — which 
contains most of the probability mass — typically corresponds with waiting times of an hour, and 
the second peak typically corresponds with waiting times of two days. This finding prompted us 
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FIG. 4: The statistical analysis presented in Fig. 2b of Ref. 11311 . A, Estimation of a lower bound on waiting 
time, t w . Two of us sent about twenty replies to an e-mail already in our inboxes. To minimize the time 
required to do this, we did not read the e-mail to which we were responding but simply wrote "yes" at the 
top of our reply and then clicked send. We find that 6 seconds is the smallest waiting time feasible for a 
human. B, Reproduction of Fig. 2a of Ref. 11311 obtained with VistaMetrix [ 16] and, C, the same data but 
with the boundaries of the bins clearly marked. We assumed that the data points in (B) are placed in the 
middle of the bin. Note that there are two bins recording data for t w < 1 second, while the data have a 
resolution of one second. The shaded bins indicate values with t w < 6 seconds consisting of 1% of the 
data. 



to investigate whether the superposition of two log-normals would provide a better description of 
the data than a single log-normal. The probability function in this case has the functional form: 



0.5 



1 + / erf 



/ii 



1 - /) erf 



1^2 



(4) 



<TlV2 J V V2V2 , 

where [L\ and [12 are the means the two peaks, o\ and a 2 are the standard deviations of the two 
peaks, and / is the probability mass in the first peak. 

In order to conduct the KS tests and Bayesian model selection, we must first estimate the 
parameters of the double log-normal distribution, Eq. @. Unlike the earlier analyses, it is not 
possible to estimate the parameters of the distribution without performing a fit of Eq. © to the data. 
We perform maximum likelihood estimation | ni to determine the best estimate parametrization 
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FIG. 5: Bayesian model selection protocol for waiting times. A, Cumulative distributions of u = ln(r) for 
three users in the database. The top panels show data and power-law predictions over intermediate values 
of u whereas the middle and bottom panels depict Gaussian and double Gaussian model predictions for the 
entire range of u. B-C, Scatter plot of the ratio of the two Pks values for all available users depending on 
the number of e-mails sent for each user. The larger circles highlighted in green, yellow, and red correspond 
to the data shown in (A). Users outside the domain are indicated by the arrows. D-E, Recursively calculated 
posterior probability of accepting the power-law model for in comparison with the log-normal and double 
log-normal models. We use Bayesian model selection to recursively calculate the posterior probability of 
selecting the power-law distribution for two different prior probabilities: P(power — law) = 0.95 (solid 
black) and P(power — law) = 0.50 (red dashed). The posterior probability of selecting the power-law 
model vanishes after considering 140 and 49 of the 724 users for the log-normal and double log-normal 
comparisons, respectively. 
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Number of users considered 

FIG. 6: Bayesian model selection protocol for comparing the double log-normal, log-normal, and power- 
law with exponent a = 1 distributions. After considering all of the 724 available users, the posterior 
probability of the power-law and log-normal vanishes. 

of Eq. ©; see Appendix lAl for details. 

After determining the parameters of the double log-normal distribution, we conduct KS tests 
and Bayesian model selection as before, and we find that a double log-normal has a posterior 
probability of one when compared with the power-law model (Fig.|5]). In fact, if we consider all 
three candidate models simultaneously, we still find that the posterior probability of the double 
log-normal is one (Fig.©. 



B. Alternative definition of the waiting times 

Recently, Barabasi and co-workers have reinterpreted the definition of the waiting 

times introduced in Ref. Ill 311 . Barabasi and co-workers note that the actual waiting time should 
not be counted from the time the original e-mail was sent, but from the time the original e-mail 
was first read. This appears perfectly logical, but the database under investigation does not provide 
us with information on when the user actually first read the e-mail. In fact, as we explained earlier 
the database does not even provide information that would enable one to decide whether an e-mail 
is a reply to a previous message or whether it is a totally unrelated message. 

Nonetheless, it is worthwhile to analyze in greater detail the manner in which the authors of 
Refs. QQ measure the waiting time r w since they characterize it as an improvement over the 
original method [13]. At time t\ user A sends an e-mail to user B. At time t 2 > t\, user B sends an 
e-mail. At time £ 3 > t 2 , user B sends an e-mail to user A. The "real" waiting time is now defined 
as r r = t 3 — t 2 , instead of t w = t a — t\. Note that t 2 still is not the actual time when the user 
actually first read the e-mail. 

We find three major problems with the reported predictive ability of the priority queuing model 
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FIG. 7: Apparent agreement of new waiting time measure with priority queuing model prediction (p = 
0.999999,L = 2) in Figs, la-b of Ref. [20]. A, Reproduction of the empirical probability density (open 
circles) and purported model solution (red line) from Fi g, l a of Ref. lEoll using VistaMetrix Jl^l . To match 
the model with the empirical data, the authors of Ref. 12011 claim that r r = is actually r r = 1 (arrow). 
Moreover, the authors of Ref. [20] do not use the actual model solution from Ref. [21] (blue dashed line) 
as claimed. B, Probability function for the empirical waiting times, the purported model prediction, and the 
actual model solution. Even if the purported model solution was correct, it is visually apparent that it does 
not match the large gap in waiting times between t w = 1 and t w = 60 seconds. 

to capture the peak, the power-law regime, and the exponential cut-off of the waiting time distri- 
butions. First, we are troubled that the "agreement" for the peak at r r — 1 is obtained by making 
the transformation r r = 1 if t 3 = t 2 , instead of r r = as would be expected from the definition. 
In other words, to match the peak at r r = 1, Barabasi and co-workers state that = 1. 

Moreover, we are surprised that Barabasi and co-workers claim to use the exact model solution 
to predict the empirical waiting times. Unlike Fig. la of Ref. [20], the exact probability density has 
a large, discontinuous drop at r r = 1 [21]. When we compare the data presented in Ref. [Q] with 
the actual solution, it is clear that the model does not, in fact, match the empirical data (Fig.|7K)- 

Finally, there are no waiting time values for r w between 1 and 60 seconds whereas the prior- 
ity queuing model predicts a smooth continuous decrease of the probability density function in 
that region. While the difference between the two functions is difficult to discern in the plot of 
Ref. [19], the difference is actually quite marked (Fig.|7jB). 
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FIG. 8: Comparison of time series of a typical user versus the time series assumed in the priority queuing 
model. The time series of 100 activities for an actual user is quite different than the time series for 100 
activities for the priority queuing model. In the priority queuing model, one task is executed at each time 
step causing the interevent times to be distributed according to a Dirac-delta function. As demonstrated in 
Section|ffl]and Ref. llUl . the interevent times are distributed with a heavy-tail. 

V. THE PRIORITY QUEUING MODEL 

We also examined the priority queuing model presented to explain the reported power-law in 
e-mail communication 11311 . This model is defined as follows. An individual has a priority queue 
with L tasks. Each task is assigned a priority x drawn from a uniform distribution p{x) = U[0, 1]. 
At each unit time step, the user executes either the highest-priority task with probability p or 
a randomly selected task with probability 1 — p. The executed task is then removed from the 
queue and a new task with priority x, again drawn from p(x), is added to the queue. For the sake 
of comparison of the model predictions with the empirical data, Barabasi surmised that a user's 
queue consists of e-mails which require a response. The model thus predicts the time r w that a 
message spends in the user's inbox prior to response. 

We first address the deficiencies in the model's assumptions. First, humans can only process 
a handful of pieces of information at any time [22]. However, many users of e-mail hold tens, 
hundreds, or even thousands of e-mails in their inbox which may require action. It is therefore 
unrealistic to expect any user to account for each task's priority or to be able to carefully determine 
the absolute (or even the relative) priority of such a large number of tasks. 

Secondly, the priority queuing model does not account for the heterogeneities in interevent 
times revealed by our analysis in Section [Hi] and reported in Ref. ifnl . In the priority queuing 
model, tasks are executed at each time step which means that the distribution of interevent times 
can be described by Dirac's delta function (Fig. [8]). 
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FIG. 9: Deducing the steady state behavior of the priority queuing model. A-D The cumulative distribution 
of priorities x in queue after 10' tasks have been executed where i = 0, 1, 2, • • ■ , 9. The author of Ref. [ 13], 
considers the case of (D) L = 100 and p = 0.99999 after 10 6 tasks have been executed (red dashed line). In 
this case, however, the model has not even reached steady-state. The important thing to notice is that after a 
short transient time, the priorities become uniformly distributed in the very small interval [0, 1 — p] denoted 
with the gray shading. When new tasks are added to the queue from a uniform distribution on the interval 
[0, 1], the vast majority of new tasks are executed immediately. This feature of the priority queuing model 
is not representative of human behavior. 

The priority queuing model also suffers from several unrealistic predictions. First, the time for 
the model to reach steady-state increases as Lj (1—p) [21]. This means that for the case considered 
in Ref. (L = 100, p = 0.99999), the time to reach steady-state is on the order of 10 7 tasks. If 
a user operating according to those parameter values sends 100 e-mails a day (a very large number 
of e-mails), it would take him 100,000 days ~ 300 years to reach steady-state. It is also worthwhile 
to note that the results for the model in Ref. lol were not even obtained for steady-state, implying 
that the data is actually a mixture of different stages of the relaxation process of the model. 

Second, after reaching steady-state, the dynamics of the model become quite anomalous. The 
priorities of the tasks in the user's queue converge to a uniform distribution f7[0, 1 — p] (Fig.©, 



14 



L = 2 



L = 100 




Waiting time 

FIG. 10: Contrasting the transient and steady-state behavior of the priority queuing model. A-D, We plot 
the distribution of waiting times during the transient period for times t < 0.1 L/(l — p) (red) and steady- 
state for times t > 10 L/(l — p) (black). Notice that the likelihood that a task is executed after spending 
more than one unit of time on the queue is vanishingly small. 



while new tasks arrive with a priority drawn from Z7[0, 1]. Thus, in the limit p — > 1, an e-mail user 
has a queue consisting of extremely low priority tasks and consequently performs new tasks with 
probability p — > 1 immediately upon arrival. This results in a peak at r w = 1 that accounts for 
nearly all of the probability mass (Fig. FTOb . Clearly this situation is not representative of e-mail 
activity, let alone human behavior as Ref. II 1 3fl claims. 

More fundamentally, the priority queuing model predicts a distribution of waiting times that 
decays as a power-law with an exponent a — 1 H19L 12111 whereas the actual data clearly rejects 
that description; cf . Section |IVJ In fact, a superposition of two log-normals, one corresponding 
to waiting times of less than a day and another corresponding to a waiting time of several days 
provides an excellent description of the empirical data. More importantly, that description agrees 
with the common experience of e-mails users: one replies to e-mails within the day, if not, within 
the next few days, and if not then, never. 
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VI. CONCLUSIONS 



Here, we have quantitatively analyzed human e-mail communication patterns. In particular, 
we have found that the interevent times are well-described by a log-normal distribution while the 
waiting times are well-described by the superposition of two log-normal distributions. We have 
simultaneously rejected the hypothesis that either quantity is adequately described by a truncated 
power-law with exponent a ~ 1 . 

We have also critically examined the priority queuing model proposed by Barabasi to match 
the empirically observed waiting time distributions. After detailed analysis, we conclude that 
neither the assumptions nor the predictions of the model are plausible. We note that the model 
does not match the empirically observed waiting time distribution, and we therefore contend that 
the theoretical description of human dynamics is an open problem. 

Barabasi and coworkers have also examined the dynamics of letter writing 112311 . web browsing, 
library loans, and stock broker transactions II 1911 . They argue that these processes also follow 
power-law distributions and are consequences of similar priority queuing processes. Our analysis 
demonstrates that care must be taken when describing data with fat-tails, particularly when the 
apparent scaling exponent is close to one and the probability distribution is concave. 

APPENDIX A: MAXIMUM LIKELIHOOD ESTIMATION 

In maximum likelihood estimation, the likelihood function C for distribution model Aij given 
the data {u W)i } is 

N 

C (Mj\ {u W)i }) = J[p (u W)k \Mj) , (Al) 

k=l 

where iV is the number of data points in the sample, the } are me empirical data points 
and p{u w \M.j) is the probability density function for the candidate model Aij evaluated at each 
empirical data point. We then maximize the likelihood C to find the parametrization of the 
model distribution p(u w \Aij) that best approximates the data. For the double log-normal model, 
p(tt™| double log-normal) = p(u w ; f, /ii, a%, H2, cr 2 ). To find the best estimate of the five pa- 
rameters {/, fii, <j\, fJ,2, 02 }> we obtain preliminary estimates for [i\ and 0\ from the mean and 
standard deviation of {«„,_,} and subsequently maximize likelihood C to find the appropriate 
double log-normal parameters. In practice, however, one typically performs a minimization of 
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